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Abstract 

Two important similarity measures between sequences are the longest common subsequence 
(LCS) and the dynamic time warping distance (DTWD). The computations of these measures for 
two given sequences are central tasks in a variety of applications. Simple dynamic programming 
algorithms solve these tasks in 0(n 2 ) time, and despite an extensive amount of research, no 
algorithms with significantly better worst case upper bounds are known. 

In this paper, we show that an 0(n 2_e ) time algorithm, for some e > 0, for computing 
the LCS or the DTWD of two sequences of length n over a constant size alphabet, refutes the 
popular Strong Exponential Time Hypothesis (SETH). Moreover, we show that computing the 
LCS of k strings over an alphabet of size 0(k) cannot be done in 0(n k ~ e ) time, for any e > 0, 
under SETH. Finally, we also address the time complexity of approximating the DTWD of two 
strings in truly subquadratic time. 


Supported by NSF Grant CCF-1417238, BSF Grant BSF:2012338 and a Stanford SOE Hoover Fellowship. 



1 Introduction 


In many applications it is desirable to determine the similarity of two or more sequences of letters. 
The sequences could be English text, computer viruses, pointwise descriptions of points in the 
plane, or even proteins or DNA sequences. Because of the large variety of applications, there are 
many notions of sequence similarity. Some of the most important notions are the Longest Common 
Subsequence (LCS), the Edit-Distance, the Dynamic Time Warping Distance (DTWD) and the 
Frechet distance measures. Considerable algorithmic research has gone into developing techniques 
to compute these measures of similarity. Unfortunately, even when the input consists of two strings, 
the time complexity of the problems is not well understood. There are classical algorithms that 
compute each of these measures in time that is roughly quadratic in the length of the strings, and 
this quadratic runtime is essentially the best known. A common technique to explain this quadratic 
bottleneck is to reduce the so called 3SUM problem to the problems at hand. This approach has 
enjoyed a tremendous amount of success |GQ12] , Nevertheless, there are no known reductions from 
3SUM to the above four sequence similarity problems. Two recent papers |Bril4l IBI14] explained 
the quadratic bottleneck for Frechet distance and Edit-Distance by a reduction from CNF-SAT, 
thus showing that any polynomial improvement over the quadratic running time for these two 
problems would imply a breakthrough in SAT algorithms (refuting the Strong Exponential Time 
Hypothesis (SETH) that we define below). A natural question is, can the same hypothesis explain 
the quadratic bottleneck for other sequence similarity measures such as DTWD and LCS? This 
paper answers this question in the affirmative, providing conditional lower bounds based on SETH 
for LCS and DTWD, along with other interesting results. 

LCS. Given two strings of n symbols over some alphabet E, the LCS problem asks to compute 
the length of the longest sequence that appears as a subsequence in both input strings. It is a very 
basic problem that we encounter in undergraduate-level computer science courses, with a classic 
0(n 2 ) dynamic programming algorithm [CLR.SOflj . LCS attracted an extensive amount of research, 
both for its mathematical simplicity and for its large number of important applications, including 
data comparison programs and bioinformatics. In many of these applications, the size of n makes 
the quadratic time algorithm impractical. Despite a long list of improved algorithms for LCS and 
its variants in many different settings, e.g. [HTr7ol [HS771J (see BI1R00 for a survey), the best 
algorithms on arbitrary strings are only slightly subquadratic and have an 0(n 2 /log 2 ?r) running 
time |MP80| if the alphabet size is constant, and 0(n 2 (loglogn)/log 2 n) otherwise [BFC081 IGral4l . 

DTWD. Given two sequences of n points P\ and P 2 , the dynamic time warping distance between 
them is defined as the minimum, over all monotone traversals of Pi and P 2 , of the sum over 
the stages of the traversal of the distance between the corresponding points at that stage (see 
the preliminaries for a formal definition). When defined over symbols, the distance between two 
symbols is simply 0 if they are equal and 1 otherwise. The DTWD problem asks to compute the 
score of the optimal traversal of two given sequences. Note that if instead of taking the sum over 
all the stages of the traversal, we only take the maximum distance, we get the discrete Frechet 
distance between the sequences, a well known measure from computational geometry. 

DTWD is an extremely useful similarity measure between temporal sequences which may vary 
in time or speed, and has long been used in speech recognition and more recently in countless data 
mining applications. A simple dynamic programming algorithm solves DTWD in 0(n 2 ) time and 
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is the best known in terms of worst-case running time, while many heuristics were designed in order 
to obtain faster runtimes in practice (see Wang et al. for a survey [WDT + fo] ). 

Hardness assumption. The Strong Exponential Time Hypothesis (SETH) pPOHlLPZOlj asserts 
that for any e > 0 there is an integer k > 3 such that fc-SAT cannot be solved in 2^ 1 ~ £ > n time. 
Recently, SETH has been shown to imply many interesting lower bounds for polynomial time 
solvable problems |PW101 IRV131 IAV141 IAVW141 IBril4l IBI14] . We will base our results on the 
following conjecture, which is possibly more plausible than SETH: it is known to be implied by 
SETH, yet might still be true even if SETH turns out to be false. See Section [2721 for a discussion. 

Conjecture 1. Given two sets of n vectors A, B in {0, l} d and an integer r > 0 , there is no £ > 0 
and an algorithm that can decide if there is a pair of vectors a € A,b G B such that Ya=i a A ^ r > 
in 0(n 2 ~ £ ■ poly(d )) time. 

Previous work. Out of the many recent SETH-based hardness results, most relevant to our 
work are the following three results concerning sequence similarity measures. 

Abboud, Vassilevska Williams and Weimann [AVW14] proved that a truly sub-quadratic algo- 
ri thriQ for alignment problems like Local Alignment and Local-LCS refutes SETH. However, the 
“locality” of those measures was heavily used in the reductions, and the results did not imply any 
barrier for “global” measures like LCS. 

Bringmann [Bril4| proved a similar lower bound for the computation of the Frechet distance 
problem. As mentioned earlier, DTWD is equivalent to Frechet if we replace the “max” with a 
“sum”. 

Most recently, Backurs and Indyk |BI14j proved a similar quadratic lower bound for Edit- 
Distance. LCS and Edit-Distance are closely related. A simple observation is that the computation 
of the LCS is equivalent to the computation of the Edit-Distance when only deletions and insertions 
are allowed, but no substitutions. Thus, intuitively, LCS seems like an easier version of Edit- 
Distance, since it a solution has fewer degrees of freedom, and the lower bound for Edit-Distance 
does not immediately imply any hardness for LCS. 

1.1 Our results 

Our main result is to show that a truly sub-quadratic algorithm for LCS or DTWD refutes Con¬ 
jecture [Tj (and SETH), and should therefore be considered beyond the reach of current algorithmic 
techniques, if not impossible. Our results justify the use of sub-quadratic time heuristics and 
approximations in practice, and add two important problems to the list of SETH-hard problems! 

Theorem 1. If there is an e > 0 such that either 

• LCS over an alphabet of size 7 can be computed in 0{n 2 ~ e ) time, or 

• DTWD over symbols from an alphabet of size 5 can be computed in 0(n 2 ~ £ ) time, 
then Conjecture 0 is false. 

1 A truly (or strongly) sub-quadratic algorithm is an algorithm with 0(n 2 ~ £ ) running time, for some e > 0. 
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We note that the non-existence of 0(n 2_e ) algorithm for DTWD between two sequences of 
symbols over an alphabet of size 5 implies that there is no 0(n 2 * ~ e ) time algorithm for DTWD 
between two sequences of points from t\ (4-dimensional Euclidean space). This follows because we 
can choose 5 points in 4-dimensional Euclidean space so that any two points are at distance 1 from 
each other, i.e., choose the vertices of a regular 4-simplex. 

Next, we consider the problem of computing the LCS of A > 2 strings, which also is of great 
theoretical and practical interest. A simple dynamic programming algorithm solves A-LCS in 0{n k ) 
time, and the problem is known to be NP-hard in general, even when the strings are binary |Mai78j . 
When A: is a parameter, the problem is W[l]-hard, even over a fixed size alphabet, by a reduction 
from Clique (Pie03j. The parameters of the reduction imply that an n°^ algorithm for A-LCS 
would refute ETH □, and an algorithm with running time sufficiently faster than 0(n k / 7 ) would 
imply a new algorithm for A-Clique. However, no results ruling out 0(n k ~ l ) or even 0{n k / 2 ) upper 
bounds were known. 

In this work, we prove that even a slight improvement over the dynamic programming algorithm 
is not possible under SETH when the alphabet is of size 0(A). 

Theorem 2. If there is a constant e > 0, an integer A > 2, and an algorithm, that can solve k-LCS 
on strings of length n over an alphabet of size 0(k) in 0{n k ~ £ ) time, then SETH is false. 

A main question we leave open is whether the same lower bound holds when the alphabet size 
is a constant independent of A. In Section [6] we prove Theorem [2] and make a step towards resolving 
the latter question by proving that a problem we call Local-A-LCS has such a tight n k ~° ^ lower 
bound under Conjecture [T] even when the alphabet size is 0(1). 

Finally, we consider the possibility of truly sub-quadratic algorithms for approximating these 
similarity measures. The LCS and Edit-Distance reductions do not imply any non-trivial hardness 
of approximation. For Frechet in 2-dimensional Euclidean space, Bringmann [Bril4j was able to 
rule out truly sub-quadratic 1.0001-approximation algorithms. Here, we show that Bringmann’s 
construction implies approximation hardness for DTWD and Frechet when the distance function 
between points is arbitrary, and is not required to satisfy the triangle inequality. The details are 
presented in Section [5j 

1.2 Technical contribution 

Our reductions build up on ideas from previous SETH-based hardness results for sequence alignment 
problems, and are most similar to the Edit-Distance reduction of [BI14] , with several new ideas in 
the constructions and the analysis. As in previous reductions, we will need two kinds of gadgets: 
the vector or assignment gadgets, and the selection gadgets. Two vector gadgets will be “similar” 
iff the two vectors satisfy the property we are interested in (we want to find a pair of vectors that 
together satisfy some certain property). The selection gadget construction will make sure that 
the existence of a pair of “similar” vector-gadgets (i.e., the existence of a pair of vectors with the 
property), determines the overall similarity between the sequences. That is, if there is a pair of 
vectors satisfying the property, the sequences are more “similar” than if there is non. Typically, 
the vector-gadgets are easier to analyze, while the selection-gadgets might require very careful 
arguments. 

2 The exponential time hypothesis (ETH) is a weaker version of SETH: it asserts that there is some e > 0 such 

that 3SAT on n variables requires Q,(2 en ) time. 
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There are multiple challenges in constructing and analyzing a reduction to LCS. Our first main 
contribution was to prove a reduction from a weighted version of LCS (WLCS), in which different 
letters are more valuable than others in the optimal solution, to LCS. Reducing problems to WLCS 
is a significantly easier and cleaner task than reducing to LCS. Our second main contribution was 
in the analysis of the selection gadgets. The approach of |BI14j to analyze the selection gadgets 
involved a case-analysis which would have been extremely tedious if applied to LCS. Instead, we 
use an inductive argument which decreases the number of cases significantly. 

One way to show hardness of DTWD would be to show a reduction from Edit-Distance. How¬ 
ever, we were not able to show such a reduction in general. Instead, we construct a mapping / with 
the following property. Given the hard instance of Edit-Distance, that were constructed in [BI14] . 
consisting of two sequences x and y, we have that EDIT(x,y) = DTWD (f(x),f(y)). This requires 
carefully checking that this equality holds for particularly structures sequences. 

2 Preliminaries 

For an integer n, [n] stands for {1, 2, 3,..., n}. 

2.1 Formal definitions of the similarity measures 

Definition 1 (Longest Common Subsequence). For two sequences P\ and P 2 of length n over 
an alphabet E, the longest sequence X that appears in both P\ , P 2 as a subsequence is the longest 
common subsequence (LCS) of P\,P 2 and we say that LCS(P\, P 2 ) = |X|. The Longest Common 
Subsequence problem asks to output LCS(P\, P 2 ). 

Definition 2 (Dynamic time warping distance). For two sequences x and y of n points from a 
set E and a distance function d : E x E —> R 0+ , the dynamic time warping distance, denoted by 
DTWD(x,y), is the minimum cost of a (monotone) traversal of x and y. 

A traversal of the two sequences x, y has the following form: We have two markers. Initially, 
one is located at the beginning of x, and the other is located at the beginning of y. At every step, 
one or both of the markers simultaneously move one point forward in their corresponding sequences. 
At the end, both markers must be located at the last point of their corresponding sequence. 

To determine the cost of a traversal, we consider all the 0{n ) steps of the traversal, and add 
up the following quantities to the final cost. Let the configuration of a step be the pair of symbols 
s and t that the first and second markers are pointing at, respectively, then the contribution of this 
step to the final cost is d(s,t). 

The DTWD problems asks to output DTWD(x,y). 

In particular, we will be interested in the following special case of DTWD. 

Definition 3 (DTWD over symbols). The DTWD problem over sequences of symbols, is the special 
case of DTWD in which the points come from an alphabet E and the distance function is such that 
for any two symbols s,t € E, d(s,t ) = 1 if s ^ t and d(s,t) = 0 otherwise. 

Besides LCS and DTWD which are central to this work, the following two important measures 
will be referred to in multiple places in the paper. 

Definition 4 (Edit-Distance). For any two sequences x and y over an alphabet E, the edit dis¬ 
tance EDIT(x , y) is equal to the minimum number of symbol insertions, symbol deletions or symbol 
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substitutions needed to transform x into y. The Edit-Distance problem asks to output EDIT(x,y) 
for two given sequences x,y. 

Definition 5 (The discrete Frechet distance). The definition of the Frechet distance between two 
sequences of points is equivalent to the definition of the DTWD with the following difference. Instead 
of defining the cost of a traversal to be the sum of d(s, t ) for all the configurations of points s and t 
from the traversal, we define it to be the maximum such distance d(s,t). The Frechet problem asks 
to compute the minimum achievable cost of a traversal of two given sequences. 

2.2 Satisfiability and Orthogonal Vectors 

To prove hardness based on Conjecture |T] and therefore SETH, we will show reductions from the 
following vector-finding problems. 

Definition 6 (Orthogonal Vectors). Given two lists {a'j},; e [ n ] and {/?*}«£[„] of vectors at, fa € 
{0,l} d , is there a pair that is orthogonal, Ylh =' Pj[h] = 0? 

This problem is known under many names and equivalent formulations, e.g. Batched Partial 
Match, Disjoint Pair, and Orthogonal Pair. Starting with the reduction of Williams |Wil05j . this 
problem or variants of it have been used in every hardness result for a problem in P that is based 
on SETH, via the following theorem. 

Theorem 3 (Williams [Wil05| ). If for some e > 0, Orthogonal Vectors on n vectors in {0, l} rf for 
d = O(logn) can be solved in 0{n 2 ~ e ) time, then CNF-SAT on n variables and poly(n) clauses can 
be solved in 0{2 lyl ~ £ / 2 ^ n poly(n)) time, and SETH is false. 

The proof of this theorem is via the split-and-list technique and will follow from the proof of 
Lemma Q] below. The following is a more general version of the Orthogonal Vectors problem. 

Definition 7 (Most-Orthogonal Vectors). Given two lists {ai}i G [ n ] and {/?i}ie[n] of vectors cti,/3i € 
{0,l} rf and an integer r £ {0,... , d}, is there a pair that has inner product at most r, 

/Vb=i a i[h] ■ f3j[h\ < r? We call any two vectors that satisfy this condition (r-) far, and (r-) close 
vectors otherwise. 

Clearly, an 0{n 2 ~ e ) algorithm for Most-Orthogonal Vectors on d dimensions implies a similar 
algorithm for Orthogonal Vectors, while the other direction might not be true. In fact, while faster, 
mildly sub-quadratic algorithms are known for Orthogonal Vectors when d is polylogarithmic, with 
0(n 2 /superpolylog(n)) running times [CIP021 IILPS141 IAWY15] , we are not aware of any such 
algorithms for Most-Orthogonal Vectors. 

Lemma [T| below shows that such algorithms would imply new 0(2 n /superpoly(n)) algorithms 
for MAX-CNF-SAT on a polynomial number of clauses. While such upper bounds are known for 
CNF-SAT [AWY1511DH09) . to our knowledge, o(2 n ) upper bounds are known for MAX-CNF-SAT 
only when the number of clauses is linear in the number of variables |DW(ICK04j . Together with the 
fact that the reductions from Most-Orthogonal Vectors to LCS, DTWD and Edit-Distance incur 
only a polylogarithmic overhead, this implies that shaving a superpolylogarithmic factor over the 
quadratic running times for these problems might be difficult. The possibility of such improvements 
for pattern matching problems like Edit-Distance was recently suggested by Williams |Will4] , as 
another potential application of his breakthrough technique for All-Pairs-Shortest-Paths. 

More importantly, Lemma [Tj shows that refuting Conjecture |Tj implies an 0(2( 1_£ ) n poly(n)) 
algorithm for MAX-CNF-SAT and therefore refutes SETH. 
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Lemma 1. If Most-Orthogonal Vectors on n vectors in {0, l} rf can be solved in T(n,d) time, then 
given a CNF formula on n variables and M clauses, we can compute the maximum number of 
satisfiable clauses (MAX-CNF-SAT), in 0(T(2 n / 2 ,M) -logM) time. 

Proof. Given a CNF formula on n variables and M clauses, split the variables into two sets of 
size nj 2 and list all 2 n / 2 partial assignments to each set. Define a vector v(a) for each partial 
assignment a which contains a 0 at coordinate j £ [M] if a sets any of the literals of the j th clause 
of the formula to true, and 1 otherwise. In other words, it contains a 0 if the partial assignment 
satisfies the clause and 1 otherwise. Now, observe that if a, fl are a pair of partial assignments for 
the first and second set of variables, then the inner product of v(a) and v(/3 ) is equal to the number 
of clauses that the combined assignment (a, /3) does not satisfy. Therefore, to find the assignment 
that maximizes the number of satisfied clauses, it is enough to find a pair of partial assignments 
a, (3 such that the inner product of v(a),v(/3 ) is minimized. The latter can be easily reduced to 
O(logM) calls to an oracle for Most-Orthogonal Vectors on N = 2 n / 2 vectors in {0, 1} M with a 
standard binary search. □ 

By the above discussion, a lower bound that is based on Most-Orthogonal Vectors can be 
considered stronger than one that is only based on SETH. 

3 Hardness for LCS 

In this section we provide evidence for the hardness of the Longest Common Subsequence problem, 
and prove the first item in Theorem [I] 

As an intermediate step, we first show evidence that solving a more general version of the 
problem in strongly subquadratic time is impossible under Conjecture [I] 

Definition 8 (Weighted Longest Common Subsequence (WLCS)). For two sequences Pi and P 2 of 
length n over an alphabet E and a weight function w : E —» [K], let X be the sequence that appears 
in both P\,Po as a subsequence and maximizes the expression W{X) = w(x[i\). We say that 

X is the WLCS of P \, P 2 and write WLCS{P\,P 2 ) = W(X). The Weighted Longest Common 
Subsequence problem asks to output WLCS{P\, P 2 ). 

Note that a common subsequence X of two sequences Pi, P 2 can be thought of as an alignment 

\x\ 

or a matching A = {(aj,h;)}( = ( between the two sequences, so that for all i £ [|Aj] : Pi[a*] = P 2 [bi], 

and ai < • • • < a|x| and b\ < ■ ■ ■ < b\x\- Clearly, the weight [Pi [a,] = P 2 [bi\ of the 

matching A correspond to the length W(X) of the weighted length of the common subsequence X. 

In our proofs, we will find useful the following relation between pairs of indices. For a pair (x, y ) 
and a pair (a/, y') of indices we say that they are in conflict or they cross if x < x' and y > y' or 
x > x' and y < y'. 

3.1 Reducing WLCS to LCS 

The following simple reduction from WLCS to LCS gives a way to translate a lower bound for 
WLCS to a lower bound for LCS, and allows us to simplify our proofs. 

Lemma 2. Computing the WLCS of two sequences of length n over E with weights w : E —> [K] 
can be reduced to computing the LCS of two sequences of length O(Kn) over E. 
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Proof. The reduction simply copies each symbol t S S in each of the sequences w(£) times. That 
is, we define a mapping / from symbols in £ to sequences of length up to K so that for any 

f{£) = [£ w W] € T, w W. 

For a sequence P of length n over £, let f(P) = That is, replace the i th symbol 

P[i\ with the sequence f(P[i]) defined above. 

Note that \f(P)\ < K\P\ and the reduction follows from the next claim. 

Claim 1. For any two sequences P \, P 2 of length n over £, the mapping f satisfies: 

WLCS(P 1 ,P 2 ) = LCS(f(P 1 ),f(P 2 )). 

Proof. For brevity of notation, we let P[ = /(Pi) and P 2 = /(P 2 ). 

First, observe that WLCS(P\, P 2 ) < LCS(P[, P 2 ), since for any common subsequence X of 
Pi, P 2 , the sequence f(X) is a common subsequence of P{, P 2 and has length \f(X)\ = )P" =1 |/(X[i])| 

Er=i w(x[i]) = w(x). 

In the remainder of this proof, we show that WLCS(P\,P 2 ) > LCS(P[, P 2 ). Let X be the LCS 
of P[,P 2 and consider a corresponding matching A. 

Let x € {1,2}. We say that a symbol £ in P' x at index i < Kn belongs to interval I x (i) €: [n], 
iff this symbol was generated when mapping P x [I x (i )] to the subsequence f(£). Moreover, we say 
that it is at index J x (i ) € [u/t')] in interval I x (i), iff it is the J x (i) th symbol in that interval. 

We will go over the symbols £ € £ of the alphabet in an arbitrary order, and perform the 
following modifications to X and the matching A for each such symbol in turn. 

Go over the indices i of P{ that are matched in A to some index j of P 2 , and for which P[ [i] = £, 
in increasing order. Consider the intervals I\{i) and I 2 (j), both of which contain the symbol £, w(£) 
times. Throughout our scan, we maintain the invariant that: i is the first index to be matched to 
the interval I 2 (j). 

If J\{i) = J 2 (j ) = 1, and the next w{£) — 1 pairs in our matching A are matching the rest of 
the interval I\{i) to the interval I 2 (j), we do not need to modify anything, and we move on to the 
next index i! that is not a part of this interval I\{i) and is matched to some index j' - note that 
at this point, i' satisfies the invariant, since it cannot also be matched to the interval I 2 (J ) by the 
pigeonhole principal, and therefore I 2 {j') > I 2 {j) and i' is the first index to be matched to this 
interval. 

Otherwise, we modify A so that now the whole intervals I\{i) and I 2 (j) are matched to one 
another: for each i',j' such that I\ (i r ) = Ii(i),I 2 (j') = I 2 (j), and J\ (if) = J 2 (j'), we add pair 
( i',j ') to the matching A, and remove any conflicting pairs from A. We claim that we obtain a 
matching of at least the original size, since we add w(£) pairs and we remove only up to w(£) pairs. 
To see this, note that for a pair ( x , y) to be in conflict with one of the pairs we added, it must 
be one of the following three types: (1) I\(x) = I\{i) and I 2 (y) = I 2 (j), or (2) h(x) = I\{i) but 
I 2 (y) > I 2 (j), or (3) I 2 (y) = Pij) but h(x) > Here, we use the invariant to rule out pairs 

for which I\(x) < Ii(i) or I 2 (y) < I 2 (j ). However, in any matching A, there cannot be both pairs 
of type (2) and pairs of type (3), since any such two pairs would cross. Therefore, we conclude that 
all conflicting pairs either come from the interval Ii(i) or they all come from the interval I 2 (j), and 
in any case, there are only w{£) of them. After this modification, we move on to the next index i! 
that is not a part of this interval Ii(i) and is matched (in the new matching A) to some index j' - 
as before, this i! satisfies the invariant. 

After we are done with all these modifications, we end up with a matching A of size at least |X| 
in which complete intervals are aligned to each other. Now, we can define a matching A' between P\ 
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and P 2 that contains all pairs (Ii(i), I 2 U)) for which ( i,j ) £ A. In words, we contract the intervals 
of P{,P 2 to the original symbols of P\,P 2 - Finally, A' corresponds to a common subsequence X' 
of P\ , P 2 , and W(X') = |^4| > |X| since each matched interval corresponds to some symbol £ and 
contributes w(£) matches to A and a single match of weight w(£) to A!. □ 

□ 


3.2 Reducing Most-Orthogonal Vectors to LCS 

We are now ready to present our main reduction, proving our hardness result for LCS. 

Theorem 4. Most-Orthogonal Vectors on two lists {oq}i^[ n ] and {/3j}j e [ n ] of n binary vectors in d 
dimensions (ai,/3i £ {0, l} d ) can be reduced to LCS problem on two sequences of length n • d 0 ^ 
over an alphabet of size 7. 

Proof. We will proceed in two steps. First, we will show that WLCS is at least as hard as the 
Most-Orthogonal Vectors problem. Second, given that the symbols in the constructed WLCS 
instance will have small weights, an application of Lemma [2] will allow as to conclude that LCS is at 
least as hard as the Most-Orthogonal Vectors problem. Our alphabet will be S = {0,1, 2,3,4, 5, 6}. 

We start with the reduction to WLCS. Let a, j5 denote two vectors from the Most-Orthogonal Vectors 
instance, from the first and the second set, respectively. 

We construct our coordinate gadgets as follows. For i £ [d] we define, 


CGi(a,i) = 


5465 if o.[i] = 0 
545 otherwise 


CC 2 (f3,i) = 


5645 if f3[i\ = 0 
565 otherwise 


Setting the weight function so that w(4) = u>(6) = l,u>(5) = X = 100d. 
These gadgets satisfy the following equalities: 


WLCS(CGx(a, *), CG 2 (/3, i)) 


2X + 1 if a[i] ■ P[i\ = 0 
2X otherwise 


Now, we define the vector gadgets as a concatenation of the coordinate gadgets. Let i?i(a) = 

oil CG^i) and R 2 {P) = Oil CG 2 {P,i). 

VG 1 (a) = lo R^a) 

vg 2 (p) = r 2 (P)oi 

The weight of the symbol 1 is rc(l) = A = (r + 1)2V + {d — (r + 1))(2V + 1). It is now easy to 
prove the following claims. 

Claim 2. If two vectors ot,/3, are r-far, then: 


WLCS(VGi(a), VG 2 {/3 )) > A + 1 = r ■ 2X + (d - r)( 2X + 1). 


Proof. For each i G [d], match CG 2 ((5,i) to CGi(ct, i) optimally to get a weight at least A + 1 = 
r ■ 2X + (d — r)(2X + 1). □ 

Claim 3. If two vectors a, (5, are r-close, then: 


WLCS(VG 1 (a),VG 2 (/3 )) = A. 

Proof. WLCS(CGi(a), VG 2 (f3)) > A is true because we can match the 1 symbols, which gives cost 
A. 

Now we prove that WLCS(CGi(a), VG 2 (f3)) < A. If we match the 1 symbols, then we cannot 
match any other symbols and the inequality is true. Thus, we assume now that the 1 symbols are 
not matched. 

Now we can check that, if there is a 5 symbol in VGi(a) or YG 2 ((3) that is not matched to a 
5 symbol, then we cannot achieve weight A even if we match all the other symbols (except for the 
1 symbols). Therefore, we assume that all the 5 symbols are matched. The required inequality 
follows from the fact that there are at least r + 1 coordinates where a and (5 both have 1 (the 
vectors are r-close), and the construction of the coordinate gadgets. □ 

Finally, we combine the vector gadgets into two sequences. Let VG[(a) = 0 o VG\{a ) o 2 and 
VG' 2 (/3) = 0oFG2(/3)o2o 3. Let / be a dummy vector of length d that is all 1. 

Pi = ^oO" = 1 VG' 1 (a i )o3W 

P 2 = 3 o Or=i l VG’ 2 (f) o 0?=i VG'M) o O 7=iVG' 2 (f) 

And set the weights so that w( 3) = B = A 2 and u>(0) = w( 2) = C = B 2 . 

Let Efj = 2C + A, and Eq = n • Ejj + 2 n ■ B. 

The following two lemmas prove that there is a gap in the WLCS of our two sequences when 
there is a pair of vectors that are r- far as opposed to when there is none. 

Lemma 3. If there is a pair of vectors that are r-far, then WLCS{P\, P 2 ) > Eg + 1. 

Proof. Let i,j be such that cti , f3j are r- far. Match VG'^oii) and VG 2 (f3j) to get a weight of at least 
2 C + r ■ 2X + (d — r)(2A + 1) > Ejj + 1. Match the i — 1 vector gadgets to the left of VG\ ( 04 ) to 
the i — 1 vector gadgets immediately to the left of VG 2 (f3j), and similarly, match the n — i gadgets 
to the right. The total additional weight we get is at least (n — 1) • Ejj. Finally, note that after 
the above matches, only (n — 1) out of the (3n — 1) 3-symbols in P 2 are surrounded by matched 
symbols. The remaining 2 n 3-symbols can be matched, giving an additional weight of 2 n ■ B. The 
total weight is at least Ejj + 1 + (n — 1) • Eu + 2 n ■ B = Eq + 1. □ 

Lemma 4. If there is no pair of vectors that are r-far, then WLCS(P\, P 2 ) < Eq. 

Proof. The main part of the proof will be dedicated to showing that if the n vector gadgets in P± 
are matched to a substring of n' vector gadgets from P 2 , then n' must be equal to n. This will 
follow since: if n' < n, then at least one of the 0/2 symbols in P\ will remain unmatched, and, if 
n' > n, then less than 2n of the 3 symbols in P 2 can be matched. The large weights we gave 0/2 
and 3 make this impossible in an optimal matching. It will be easy to see that in any matching in 
which n = n', the total weight is at most Eq. 

Now, we introduce some notation. Let L < L' and define W(L, L 1 ) to be the optimal score of 
matching two sequence T, T' where T is composed of L vector gadgets VG\ (a) and T’ is composed 
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of E vector gadgets VG' 2 (/3), where no pair a, (5 are r- far. Define Wq(L,E) similarly, except that 
we restrict the matchings so that all 0 or 2 symbols in T (the shorter sequence) must be matched. In 
the following two claims we prove an upper bound on W(L, E), via an upper bound on Wq(L, E). 

Claim 4. For any integers 1 < L < L', we can upper bound W$(L, L') < L ■ Ejj + (E — L) ■ (B — 1). 

Proof. Let T, T’ be two sequences with L, L' vector gadgets, respectively. We will refer to these 
“vector gadgets” as intervals. Consider an optimal matching of T and T' in which all the 0 and 2 
symbols of T are matched, i.e., a matching that achieves weight Wq(L,E) - we will upper bound 
its weight Ep by L ■ Ep + ( L' — L) ■ (B — 1). Note that in such a matching, each interval of T must 
be matched completely within one or more intervals of T', and each interval of T' has matches to 
at most one interval from T (otherwise, it must be the case that some 0 or 2 symbol in T is not 
matched). 

Let x be the number of intervals of T that contribute at most Ep to the weight of our optimal 
matching. Note that any of the L — x other intervals must be matched to a substring of T' that 
contains at least two intervals for the following reason. The 0 and 2 symbols of the interval of T' 
must be matched, and, if the matching stays within a single interval of T' and has more than Ep 
weight, then we have a pair which is r -far because of Claim [3l Thus, using the fact that there are 
only L' intervals in E, we get the condition, 

x + 2 (L — x) < L'. 

We now give an upper bound on the weight of our matching, by summing the contributions of 
each interval of T: there are x intervals contributing < Ep weight, and there are (L — x) intervals 
matched to T' with unbounded contribution, but we know that even if all the symbols of an interval 
are matched, it can contribute at most Ep = 2 C + A + d(2X + 2). Therefore, the total weight of 
the matching can be upper bounded by 

Ep < (L — x) ■ Ep + x ■ Ep 

We claim that no matter what x is, as long as the above condition holds, this expression is less 
than L • Ep + (E - L) ■ (B - 1). 

To maximize this expression, we choose the smallest possible x that satisfies the above condition, 
since Ep > Ep, which implies that x = max{0, 2 L—E}. A key inequality, which we will use multiple 
times in the proof, following from the fact that the 0/2/3 symbols are much more important than 
the rest, is that Ep < Ep + B — 1, which follows since Ep — Ep < A + d(2X + 2) < lOOOd 2 < B. 

First, consider the case where L < L' /2, and therefore x = 0, which means that all the intervals 
of T might be fully matched. Using that Ep < Ep + B — 1 and that L' — L > Z//2 > L, we get 
the desired upper bound: 

Ep < L ■ Ep < L ■ (Ep + B — 1) < L ■ Ep + (L' — E) ■ (B — 1). 

Now, assume that L > L'j 2, and therefore x = 2L — L'. In this case, when setting x as small 
as possible, the upper bound becomes: 

Ep < (l! — L) ■ Ep + (2 L — L') ■ Ep = L • Ep + (E - L) • (Ep - Ep), 
which is less than L ■ Ep + (E — L) ■ (B — 1), since Ep < Ep + B — 1. □ 
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Next, we prove by induction that leaving 0/2 symbols in the shorter sequence unmatched will 
only worsen the weight of the optimal matching. 

Claim 5. For any integers 1 < L < L' , we can upper bound W (L, L') < L ■ Ep + {JJ — L) ■ {B — 1). 

Proof. We will prove by induction on i > 2 that: for all L' > L > 1 such that L + L' < i, 
W (L, L')<L-Ep + {V — L) ■ (B — 1). 

The base case is when i = 2 and L = L' = 1. Then W(l, 1) = £1(7 and we are done. 

For the inductive step, assume that the statement is true for all i' < i — 1 and we will prove 
it for i. Let L , Z/ be so that 1 < L < L' and L + L' = i and let T, T' be sequences with L , Z/ 
intervals (assignment gadgets), respectively. Consider the optimal (unrestricted) matching of T 
and T', denote its weight by Ep. Our goal is to show that Ep < L ■ Ep + ( L' — L) ■ (B — 1). 

If every 0/2 symbol in T is matched then, by definition, the weight cannot be more than 
Wq(L, L'), and by Claim |4] we are done. Otherwise, consider the first unmatched 0/2 symbol, call 
it x, and there are two cases. 

The x = 0 case: If x is the first 0 in T, then the first 0 in T' must be matched to some 0 
after x (otherwise we can add this pair to the matching without violating any other pairs) which 
implies that none of the symbols in the interval starting at x can be matched, since such matches 
will be in conflict with the pair containing this first 0. Otherwise, consider the 2 that appears 
right before x and note that it must be matched to some y = 2 in T’. by our choice of x as the 
first unmatched 0/2. Now, there are two possibilities: either there are no more intervals in T 1 
after y , or there is a 0 right after y in T' that is matched to a 0 in T that is after x (from a 
later interval in T). Note that in either case, the interval starting at x (and ending at the 2 after 
it) is completely unmatched in our matching. Therefore, in this case, we let T\ be the sequence 
with {L — 1) intervals which is obtained from T by removing the interval starting at x. The 
weight of our matching will not change if we look at it as a matching between T' and T\ instead 
of T, which implies that Ep < W(L — 1 ,L'). Using our inductive hypothesis we conclude that 
Ep < (L — 1) ■ Eu + ( L ' — L + 1) • (B — 1) < L ■ Ep + (L' — L) ■ (B — 1), since Ep > B, and we are 
done. 

The x = 2 case: The 0 at the start of ads interval must have been matched to some y = 0. Let 
z be the 2 at the end of y’s interval. Note that z must be matched to some w = 2 in T after x, since 
otherwise, we can add the pair (x, z) to the matching, gaining a cost of C, and the only possible 
conflicts we would create will be with pairs containing a symbol inside the y —>• 2 interval or inside 
ads interval, and if we remove all such pairs, we would lose at most [A + d{2X + 2)) which is much 
less than the gain of C - implying that our matching could not have been optimal. Therefore, there 
are c > 2 intervals in T that are matched to a single interval in T'\ all the intervals starting at the 0 
right before x and ending at w are matched to the y —>• z interval. Let T\ be the sequence obtained 
from T by removing all these c intervals and let T 2 be the sequence obtained from T' by removing 
the y z interval. Our matching can be split into two parts: a matching between T± and T 2 , and 
the matching of the y —> z interval to the removed interval. The contribution of the latter part 
to the weight of the matching can be at most the weight of all the symbols in an interval, which 
is Ep- By the inductive hypothesis, we know that any matching of T\ and T 2 can have weight at 
most W(L — c, L' — 1) < (L — c) ■ Ep + (L 1 — 1 — L + c) • (B — 1). Summing up the two bounds on 
the contributions, we get that the total weight of the matching is at most: 

Ep < E B +{L-c)-Ep+{L'-L+c-l)\B-l) < L-Ep+{L'-L)-{B-l)+{c-l)\B-l)+E B -c-Ep 
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However, note that Eb < 1-IEjj and that (c — 1.1) Ejj > 10(c — 1.1 )B > (c — 1 )B, which implies 
that Ep can be upper bounded by L ■ Ejj + (L' — L) - (B — 1), and we are done. □ 

We are now ready to complete the proof of the Lemma. Consider the optimal matching of Pi 
and P 2 . Let x and y be the first and last 3 symbols in P 2 that are not matched, respectively. Note 
that there cannot be any matched 3 symbols between x and y, since otherwise we could match 
either x or y and gain extra weight without incurring any loss. Moreover, note that x cannot be 
the first symbol in P 2 and y cannot be the last one, since those must be matched in an optimal 
alignment. The substring between the 3 preceding x, and the 3 following y, contains n' intervals 
(vector gadgets) for some 1 < n' < 3n — 2. If all the 3’s are matched, we let n' = 1, and focus on 
the only interval (vector gadget) of P 2 that has matched non-3-symbols. 

We can now bound the total weight of the matching by the sum of the maximum possible 
contribution of these n' intervals, and the contribution of the rest of P 2 . The substring before and 
including the 3 symbol preceding x and the substring after and including the 3 symbol following 
y can only contribute 3’s to the matching, and they contain exactly (3n — 1 — (n' — 1)) such 3 
symbols, giving a contribution of (3n — n') ■ B. To bound the contribution of the n' intervals, we 
use Claim [5j since no 3 symbols are matched in this part, we can “remove” those symbols for the 
analysis, to obtain two sequences T,T' composed of n,n' vector gadgets, respectively, in which no 
pair is r- far. The contribution of the T, T 1 part, depends on n, n': 

If n' < n, then by Claim [H when setting L = n', L' = n, the contribution is at most (n' ■ Ejj + 
(n — n') ■ ( B — 1)) and the total weight of our matching can be upper bounded by 

(3n — n') ■ B + ( n' ■ Ejj + (n — n') ■ ( B — 1)), 

which is maximized when n' is as large as possible, since Ejj > (2 B — 1). Thus, setting n' = n, we 
get the upper bound: (3n — n) ■ B + n ■ Ejj = Eg- 

Otherwise, if n' > n, we apply Claim [5] with L = n, L' = n', and get that the contribution is at 
most (n • Ejj + (n' — n) ■ (B — 1)), and the total weight of our matching can be upper bounded by 

(3n — n') ■ B + (n ■ Ejj + {n' — n) ■ (B — 1)) = n ■ Ejj + 2n • B — ( n' — n) < Eg- 


□ 

To conclude our reduction, we note that the largest weight used in our weight function is poly¬ 
nomial in d , and therefore the reduction of Lemma [2] gives two unweighted sequences f(Pi),f(P 2 ) 
of length n ■ d°^ l \ for which the LCS equals the WLCS of our Pi, P 2 . □ 

4 Hardness for DTWD 

In this section, we complete the proof of Theorem [T] by showing that a truly sub-quadratic algorithm 
for DTWD implies a truly sub-quadratic algorithm for the Most-Orthogonal Vectors problem. 

We first show that we can modify the reduction from CNF-SAT to Edit-Distance from III 1 1 
so that we get a reduction from Most-Orthogonal Vectors to Edit-Distance. We will later use 
properties of the two sequences produced in this reduction, call them P [, P' 2 . In particular, we 
will show that there is an easy transformation of P[ into a sequence Si and of P ' 2 into a se¬ 
quence S 2 so that EDIT(P{,P 2 ) = DTWD(Si, S 2 ). This will give the desired reduction from 
Most-Orthogonal Vectors to DTWD. 
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4.1 Reducing Most-Orthogonal Vectors to Edit-Distance 

Before showing the reduction from Most-Orthogonal Vectors to Edit-Distance, let us recast the 
reduction of (BI141 as a reduction from Orthogonal Vectors instead of CNF-SAT. 


Reducing Orthogonal Vectors to Edit-Distance. Instead of having 2 N / 2 partial assignments 
for the first half of the variables and 2 N / 2 partial assignments for the second half of the variables, 
we have n vectors in the first and the second set of vectors (we replace 2 N//2 by n in the argument). 
Instead of having M clauses, we have d coordinates for every vector (we replace M by d in the 
argument). 

Instead of having clause gadgets , we have coordinate gadgets. For a vector a from the first set 
of vectors {»?;},; e[ n ] and j £ [d], we define a coordinate gadget, 


CG i(a,j) 


O l iO l n lo l l °l l oO h if a[j\ = 0, 
q^i o z °0^° 0 Z ° I^oqD otherwise. 


For a vector (3 from the second set of vectors {/3j}ie[n] and j £ [d], 


CG 2 (/3,j) 


0*10*00*0 ho ho0*1 if (3\j] = 0, 
0*1 i*o i*o ;L*o ].*o 0*1 otherwise. 


We leave g the same: g = 0 2 1 10 2 0 io l Zo l io l Zo 0 Zl . 

Instead of assignment gadgets , we have vector gadgets. 


VGi(ai) = Z 1 LV 0 RZ 2 and VG 2 (ft) = V X DV 2 , 

where R = Oje[d]CGi (a*, j), D = Oie[d]CG 2 (A, j). 

Then, we replace the statement “ip is satisfied by a± V a 2 ” with “vectors and /3i 2 are orthog¬ 
onal” and the statement “ip is satisfiable” with “there is a vector from the first set of variables and 
a vector from the second set of variables that are orthogonal”. 

For a vector v and k £ {1,2}, we have VG}(u) = 2 T VGfc(u)2 r , instead of AG}. We set 
/ £ {0, l} fZ to have f[i] = 1 for all i £ [d]. 

We define the sequences as 

Pi = cw Misw vG;(c,), 

P 2 = (OCi‘VG' a (/)) (0^ W }, 6W VG' 2 (/3)) (Or=iVG«/)). 

This completes the modification of the argument. We can check that we never use any property 
of CNF-SAT that Orthogonal Vectors does not have. 


Reducing Most-Orthogonal Vectors to Edit-Distance. Next, we modify the construction to 

show that Edit-Distance is a hard problem under a weaker assumption, i.e., that the Most-Orthogonal Vectors 

problem does not have a truly sub-quadratic algorithm (Conjectured]). 

Theorem 5. Edit-Distance does not have strongly a subquadratic time algorithm unless Most-Orthogonal Vectors 
problem has a strongly subquadratic algorithm. 
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Proof. We describe how to change the arguments from I6I14| to get the necessary reduction. We 
make all the modifications from the discussion above, as well as the following. 

We change g as follows, 

g = 0^ _ ( 1+ 5 2io )l 1+ 5 2 '°0^0 Zo l io l ?0 l io O /l . 

We replace Lemma 1 from [BI14j with the following lemma. 

Lemma 5. If and ($i 2 are far vectors, then 


EDIT{ VG^), VG 2 {P i2 )) < 2/ 2 + l + dl 0 + k2l 0 =: E s . 

Proof. We do the same transformations of sequences as in Lemma 1 from IBI11 except that we get 
upper bound E s on the cost. □ 

We replace Lemma 2 from |BI14] with the following lemma. 

Lemma 6. If a n and /?j 2 are close vectors, then 

EDIT( VG 1 (a il ), VG 2 (/3 i2 )) = 2 l 2 + l + dl 0 + k2l 0 + d =: E u . 

Proof. The proof proceeds along the same lines as the one for Lemma 2 from |BI14| . □ 

This finishes the description of the necessary changes. □ 

4.2 Reducing Most-Orthogonal Vectors to DTWD 

We are now ready to present our main reduction to DTWD. 

Theorem 6. If DTWD over sequences of symbols from an alphabet of size 5 can be solved in 
strongly sub-quadratic time, then Most-Orthogonal Vectors can also be solved in truly sub-quadratic 
time. 

Proof. The main arguments in this proof are provided in Lemmas [7] and [8] below. Here we explain 
why these two lemmas complete the proof of our theorem. 

Consider arbitrary sequences of symbols, Q\ and Q 2 . On the one hand, in Lemma |7] we will 
show that for a simple transformation /, 

EDIT (Q u Q 2 ) < DTWD(/(Qi),/(Q 2 )). 

On the other hand, in Lemma [8] below we will show that 

EDIT(P 1 / ,P') > DTWD(/(?i), f{P 2 ))i 

if P[ and P 2 are the sequences constructed in Theorem [5j 

Together, the two inequalities imply that EDIT(L > 1 / , P 2 ) = DTWD(/(Pj), f(P 2 )). This implies 
that we have the same hardness result for DTWD that we had for Edit-Distance, under the assump¬ 
tion that / is a simple transformation. We will see that / is indeed a very simple transformation, 
i.e., f(P[) and f(P 2 ) can be computed in time 0(|P{|) and 0(1^21)- 

P[ and P 2 are sequences of symbols over an alphabet of size 4. Transformation / introduce an 
extra symbol. Thus, the final sequences will be over an alphabet of size 5. □ 
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For an alphabet X, a symbol a fL X, a sequence Q = q\q 2 ...q p € X p of length p, and a vector r 
of p + 1 positive integers, we define the operation 


A r a (Q) := a ri qia 1 ' 2 q 2 a r3 ...a rp q p a rp+1 . 

Lemma 7. For any two sequences Q\ € X m and Q 2 € X n o/ length m and n, respectively, 

EDIT(Q 1 ,Q 2 ) < DTIED^^),^ 2 ^)) 


holds for any two positive integer vectors r\ and r 2 . 


Proof. In this proof, we will use use the following equivalent definition of Edit-Distance that will 
simplify the analysis. 


Observation 1. \BIlf L For any two sequences x, y, EDIT(x, y) is equal to the minimum, over all 
sequences z, of the number of deletions and substitutions needed to transform x into z, and y into 

z. 


Below we will write A instead of A r a . 

We will show how to convert a traversal of A(Qi) and A(Q 2 ) achieving DTWD cost DTWD(A(Qi), A(Q 2 )), 
into a transformation of Q\ and Q 2 into the same sequence. Using Observation [TJ we will conclude 
that the edit cost of the resulting transformations will be at most DTWD(A(Qi), A(Q 2 )), which is 
what we need to complete the proof. 

Consider an optimal DTWD traversal of A(Q\) and A(Q 2 ). At any moment, we say that a 
marker in A(Q i) or in A(Q 2 ) is of X type iff the symbol it points to is in X, i.e., it is not equal to 
a. We say that a symbol is of X type iff it is in X. 

From now on we consider only moments during the traversal of A(Q\) and A(Q 2 ) when one or 
the other, or both markers change their type. We can assume that, whenever both markers change 
their type, it is not the case that before the change, the markers have different type. Indeed, if 
this happens, we can replace the simultaneous change of type by two consecutive changes of type, 
and this modification will not change the cost. Consider any maximal contiguous subsequence of 
the sequence of moments during which only one of the markers changes its type (the marker might 
change its type during the subsequence more than one time). We claim that any such contiguous 
subsequence of moments must have an even length. Assume that this in not the case and consider 
the earliest such subsequence that has an odd length. Consider the type of the markers immediately 
before the last moment in the subsequence. Because we considered the first subsequence with an 
odd length, and both sequences start with symbols that are not of X type, we get that immediately 
before the last moment, both markers must have the same type. WLOG, assume that the last 
change of type happens to the first marker and note that immediately after the last change the 
markers have different type. At the next moment from the sequence, either both markers change 
type (which, by our observation that before a simultaneous change of type both markers must of 
the same type, is impossible) or only the second marker changes its type. Thus, we have found two 
consecutive moments from the sequence of moments in which the type changes, with the following 
three properties. 


1. None of the two changes of type are simultaneous for both markers; 


2. Both changes of type are not made by the same marker; 
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3. Before the first change of type, the markers have the same type. 

We count DTWD cost of any traversal as follows. Every jump (performed by one of the markers 
or performed by both markers simultaneously), contributes 1 to the final cost of the traversal iff 
the symbols that the markers point at immediately after the jump are different (contribution is 0 
if the symbols are the same). For two symbols x and y, l x ^ y is equal to 1 if x ^ y and is equal 
to 0 otherwise. We set x to be equal to the symbol that the marker that participates in the first 
change of type points at after the jump. We set y to be equal to the symbol that the marker that 
participates in the second change of type points at after the jump. 

The first change of the type contributes 1 to the final cost of DTWD(A(Qi), A(Q 2 )) (we consider 
the corresponding jump to the change of the type and its contribution) and the second change of 
the type contributes 1 x ^ y to the final cost. We can check that the two changes can be replaced 
by a single simultaneous change in both sequences by changing the traversal of A(Q\) and A(Q 2 ) 
(the fact that we can to this follows from the definition of A). The simultaneous change costs 
l X jt y and, therefore, we decrease the cost of DTWD by 1. This contradicts the assumption that we 
consider an optimal traversal. Therefore, the assumption that there exists a maximal contiguous 
subsequence of moments during which only one of the markers changes type and the subsequence 
is of odd length, is wrong. 

Now we can partition the entire sequence of changes of type into two kinds of contiguous 
subsequences that do not overlap. 

1. A simultaneous change of type by both markers; 

2. Two changes of type following one another made by the same marker. None of the two 
changes are simultaneous. 

We will now show the promised conversion of the DTWD traversal of A{Q{) and A(Qo) into an 
Edit-Distance transformation of Q\ and Qo into the same sequence (as in Observation [Tj) such that 
the cost only decreases. This will finish the proof that EDIT(Qi, Q 2 ) < DTWD(A(Qi), A(Q 2 ))- 

We analyze both types of subsequences. 

1. From the properties of the partition and the fact that both A(Q\) and A(Q 2 ) start with a 
symbol of £ type, we get that before and after the change of type both markers are of the 
same type. 

Case 1 . Both markers before the simultaneous change are of £ type. Suppose that the 
markers point to symbols x € £ and y € £. In this case we perform substitution of x with y 
when transforming Q\ and Q 2 into the same sequence. 

Case 2. Both markers before the simultaneous change are not of £ type. In this case we 
do not have a corresponding substitution or deletion when transforming Q\ and Q 2 into the 
same sequence. 

We see that in both cases the performed actions before (contribution to DTWD(A(Qi), A(Q 2 ))) 
and after (contribution to EDIT(Qi, Q 2 )) the conversion cost the same. 

2. Similarly as in the previous kind of subsequence, we conclude that before the first change of 
type, the markers are of the same type. We consider both possible cases. 

Case 1. Both markers before the first change of type are of £ type. Suppose that the markers 
point to symbols x G £ and y £ £. If x 7 ^ y, we perform a substitution of x with y when 
transforming Q\ and Q 2 into the same sequence. If x = y, we don’t do anything. 
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Case 2. Both markers before the first change of type are not of £ type. WLOG, the first 
marker changes the type twice. Before the second change, the first marker points to x £ £. 
We delete x when performing the transformation of Q\ and Q 2 into the same sequence. 

We can check that in the first case the cost after the conversion can only be smaller than 
before the conversion. In the second case the costs before (contribution to DTWD) and after 
(contribution to Edit-Distance) the conversion are the same. 


□ 


From now on, £ = {0,1, 2, 3} and a = 4. 

Lemma 8. For some vectors r\ and r 2 with positive, bounded integer coordinates, 

EDIT(P[,P £) > DTWD(A ri (P[),A r 2 {P! 1 )), 
where P[ and P 2 are the sequences defined in Theorem 0 

Proof. We use notation from Theorem [21 By A' we will denote a transformation A r with r, = 1 
for all i. 

Let r 3 be such that for all k £ {1,2}, 

A r3 (VGj(.(a)) = A l (f)A'{YG k {a))A'(2 T ). 


We set 

A ri (P[) = y4'(3l p 2l)yl r i(Pi)^'(3l p 2l), 

where is such that 

A r '^P 1 ) = O aieAl A r 3 (YG , 1 (a 1 )). 

We set 

A n (Pi) = V*(P 2 ) 

= (oSf-'^’IVOy/))) (0«eA^ r 3 (VG' 2 (a 2 ))) (oCr-'A^fVGif/))) . 

We will use the following lemma to prove the inequality. 

Lemma 9. For vectors a, (3 £ {0, l} d , 

EDIT( VG\(a), VG 2 (p)) > DTWD(A'(VG 1 (a)),A'(VG 2 {f3)))- 

Proof. We consider two cases. 

Case 1. The vectors a and (3 are far. In this case, we traverse the A'(ZiL) part of 2 4 , (VGi(a)) 
while the marker in A'(yG 2 (/3 )) stays at the first symbol. Then, we traverse the remaining part 
A i {VqRZ 2 ) of 2 4 , (VGi(a)) in parallel with A!(VG 2 (/3)). We can check that we achieve DTWD cost 
equal to E s = EDIT(VGi(a), VG 2 (/?)). 

Case 2. The vectors a and f3 are close. In this case, we traverse A\Z\LVq) and 2 4 / (VG 2 (/ 3 )) in 
parallel. Then, we traverse the A'(RZ 2 ) part of 2 4 , (VGi(a)) while the marker at A!(VG 2 ((3)) stays at 
the last symbol. We can check that we achieve DTWD cost equal to E u = EDIT(VGi(a), VG 2 (/ 3 )). 

□ 
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We are now ready to prove that 

EDIT(P 1 ',P^) > DTWD(^ ri (P 1 '),yl r2 (P^)). 

We are going to show a DTWD traversal of A ri (P[) and A r2 (P^) that achieves DTWD cost equal 
to EDIT(P{, P 2 ). This will imply the inequality and will finish the proof. 

We proceed by considering two cases. 

Case 1. There are two vectors aq and /3q from their respective sets that are far. We traverse 
W(VGi (aq)) and A' (yG 2 {fdi 2 )) as in Lemma [9] achieving cost E s . We traverse the rest of vector 
gadgets of A ri (P\) with their counterparts from A r 2 (P 2 ) as in Lemma El When traversing the 
sequences H/(2 T ), we do that in parallel. When traversing A'(2 T ) in parallel, it contributes nothing 
to the DTWD cost. 

We traverse the vector gadgets of that are not traversed yet, as follows. We traverse 

the symbols that have E type from H^P^) with the 3 symbols from A ri (P[) in parallel. We notice 
that we can do that in a way so that the 4 symbols never contribute towards the final DTWD cost. 
Some of the 3 symbols from A ri (P[) will still remain untraversed. We can traverse them while the 
second marker is on the last symbol of A r 2 (P 2 ) (it does not have E type). 

By computing the cost of the traversal we get that it is equal to EDIT(P{, Pg). 

Case 2. There is no pair of far vectors. This case is analogous to Case 1. The only difference 
is that we do not have two vectors aq and /3q to match. We choose them arbitrarily and then 
proceed as in the previous case. This finishes the analysis of this case. □ 


5 Hardness for approximating DTWD 


In this section we prove that approximating DTWD and Frechet in certain settings, is ruled out 
by SETH. The proofs involve simple modifications on the construction of Bringmann [Bri m for 
proving the hardness of Frechet, which are given in the following lemmas. 

Lemma 10. Given two lists {cq}ie[n] and {A}*e[n] of vectors oti, fa € {0, l} d , we can construct 
two sequences P\ and P 2 in time 0(nd) such that Frechet(Pi, P 2 ) = 0 if there are two vectors ai 
and /3j that are orthogonal and Frechet{P\, P 2 ) = 1 otherwise. The sequences of points come from 
a pointset with constant number of points with a distance function f between points that does not 
satisfy the triangle inequality. 

Proof. Let Q \ = {si, r\, ti, c\ 0 , c{ 1; cf 0 , cfj} be the poinset that we will use for the construction 
of the first sequence. Let Q 2 = {s 2 , r 2 , t 2 , 1 2 , ej, 0 , C 2 1 , c\ 0 , c| x } be the pointset that we will use 

for the construction of the second sequence. 

We set f(v 1 ,^ 2 ) = 0 for all 


(vi,v 2 ) G ({s 2 } x Q 1) U ({si} x (Q 2 \ {* 2 })) u {(n,r 2 )} U ({t 2 } x Q 1) U ({ii} x Q 2 \ {s^}) 

U {( c l,0; c 2 ,o)j ( c l,0; c 2,l)> ( c l,l) c 2,o)} U {( c l,0; c 2,o)> ( c l,0; c 2,l)> ( c l,l) c 2,o)}- 

We set f(vi,v 2 ) = 1 for all v\ G Q\ and V2 G Q 2 that we did not set yet. / is symmetric function, 
i.e., f(v i,u 2 ) = f{v2,v 1 ) for all v\ and v 2 . 

We define coordinate gadget for the first sequence as 


CGi(aqj) 


J ( mod 2 ) 
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for i £ [n] and j £ [d]. 

We define vector gadget for the first sequence as 


VGi(aj) = n o O je [d\CGi(ai,j) 


for i £ [n]. 

We define coordinate gadget for the second sequence as 

CG 2 (ft,i) = <4 

for i £ [ n ] and j £ [d]. 

We define vector gadget for the second sequence as 


VG 2 (A) = r 2 ° Oj£[d]CG 2 ((3ii j) 


for i £ [n]. 

We define the final sequences 


p i = O;e[n]0i o VGi(a») o^) 


and 

P 2 = s 2 os*o (Oie[n]VG 2 (A)) o t 2 o t 2 . 

The rest of the proof follows from Claims [ 6 ] and [3 
Claim 6. If there are vectors a.{ and that are orthogonal, then Frechet(P\, P 2 ) = 0. 

Proof. We show traversal of P± and P 2 that achieves Frechet(Pi, P 2 ) = 0. 

We stay at s 2 on P 2 and traverse the first sequence until we are at si just before VGi(aj). We 
stay at si on P\ and traverse P 2 until we are at r 2 in VG 2 (/3j). Now we perform simultaneous 
jumps in both sequences until we are at CGi (on,n) on P\ and CG 2 ((3j,n) on P 2 . We perform jump 
to t,\ on Pi. Next we perform traversal of the rest of P 2 until we are at t 2 on P 2 . Now we stay at 
t 2 on P 2 and traverse the rest of sequence Pi and we are done traversing both sequences. We can 
check that we achieve Frechet(Pi, P 2 ) = 0. □ 


Claim 7. If there are no vectors ai and (3j that are orthogonal, then Frechet(P \, P 2 ) = 1. 

Proof. We show that we can’t traverse Pi and P 2 in a way that achieves Frechet(Pi, P 2 ) = 0. By 
the construction of /, we get that Frechet(Pi, P 2 ) = 1. 

Suppose that this is not the case. Consider a traversal of Pi and P 2 that achieves Frechet(Pi, P 2 ) = 

0 . 

There will be a moment during the traversal of P 2 when we are at s 2 on P 2 . At this very 
moment, by the construction of the distance function /, we must be at si on Pi. Next jump is 
performed simultaneously on both sequences or on the second sequence only. In both cases we end 
up at r 2 on P 2 . Let’s denote this moment by t. We want to claim that there will be a moment 
when we are at r 2 on P 2 and at r\ on Pi. If at this moment t we are at ri on Pi, we have found 
the desired moment. Otherwise, consider the next moment t' after t when we are at ri on Pi. We 
claim that at this moment t' we are at r 2 on P 2 . Indeed, by the construction of /, we must be at 
r 2 or t 2 on P 2 . We can’t be at t 2 because we can only get there by traversing t 2 from P 2 but this 
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requires being at t\ on P\. So we have found a moment when we are at ?q on Pi and at r 2 on P 2 . 
By the construction of / and the requirement that we achieve Frechet cost 0, we conclude that we 
need to traverse the corresponding vector gadgets by doing simultaneous jumps. Given that there 
are no two vectors ctj and fdj that are parallel and by the construction of /, we get that we can’t 
achieve Frechet cost 0. □ 

□ 

An interesting corollary of Lemma [TO] is that any constant-factor approximation algorithm for 
the Frechet, cannot run in truly sub-quadratic time under SETH. However, in the above reduction, 
the distance function over the points of the sequences violates the triangle-inequality. Next, we 
show hardness for (3 — ^-approximation algorithms when the points come from a metric. 

Lemma 11. Given two lists {aj}ie[n] and {/3«}ie[n] of vectors cti,/3i € {0, l} d , we can construct 
two sequences P\ and P 2 in time 0{nd ) such that Frechet(Pi, P 2 ) = 1/2 if there are two vectors oti 
and (3j that are orthogonal and Frechet{P\, P 2 ) = 3/2 otherwise. Sequences of points come from a 
metric that has constant number of points. 

Proof. Let us denote the distance function that satisfies the triangle inequality by f. Let / be the 
distance function from Lemma m We set f'(yi,V 2 ) := f(y 1 ,^ 2 ) + 1/2 for v\ G Q\ and V 2 G Q 2 - 
We set f'(v i,u 2 ) = 1 if ui,u 2 € Q\ or ui,u 2 € Q 2 - 

We can check that f satisfies the triangle inequality. □ 

Finally, we observe that this construction implies that if the distance function is arbitrary, then 
DTWD cannot be approximated to within any constant factor in truly sub-quadratic time, under 
SETH, as well. 

Lemma 12. Given two lists {aj}ie[n] and {/3 i}i^[ n ] of vectors cti, G {0, l} d , we can construct two 
sequences P\ and Po in time 0(nd ) such that DTWD(P\, Po) = 0 if there are two vectors a.i and fdj 
that are orthogonal and DTWD(P\ 1 P 2 ) = 1 otherwise. Sequences of points come from a pointset 
with constant number of points with a distance function f between points that does not satisfy the 
triangle inequality. 

Proof. The sequences Pi and P 2 are the same as in the proof of Lemma [lOl 

If there are two vectors a* and (3j that are orthogonal, we traverse the sequences in the same 
way as in Lemma [ 6 l 

If there are no two vectors a, and f3j that are orthogonal, we show that DTWD(Pi,P 2 ) > 1 in 
the same way as we do in Lemma [7) The following traversal achieves DTWD(Pi,P 2 ) = 1. We stay 
at si on Pi and traverse the entire P 2 (this costs 1). We stand at t 2 and traverse the entire Pi (this 
costs 0). □ 

6 Hardness for £>LCS 

In this section we prove Theorem [2j along with another interesting lower bound for a variant of 
k- LCS (Theorem [7]) . 

As in the reduction to LCS, it will be much more convenient to reduce to the weighted version 
of the problem, defined below, as an intermediate step. 


20 


Definition 9 (k- LCS and fc-WLCS). An algorithm for k-LCS problem outputs the answer to the 
following question. Given k strings of length n over alphabet X, what is the length of the longest 
sequence that appears in all k strings as a subsequence? In k-WLCS we are also given a scoring 
function w : X —> [K] and the goal is to find the common subsequence X of all k strings that 
maximizes the sum w(X[i]). 

As before, we can think of the common subsequence as a matching of the strings. We can also 
adapt the previous proof to show a reduction from the weighted version to the unweighted version. 

Lemma 13. Computing the k-WLCS of k strings of length n over X with weights w : X —>• [K] can 
be reduced to computing the k-LCS of k strings of length 0(Kn ) over X. 

Proof. The proof is similar to the proof of Lemma [2] where we only had two strings, we will only 
outline the differences. As before, the reduction maps each symbol I into an interval of w(£) copies 
of the same symbol l. First, we can map a subsequence X of the weighted instance of weight w{X) 
into a subsequence of length w(X) of the unweighted instance by mapping each symbol of X into 
an interval. Second, we can modify a subsequence of length |X| of the unweighted instance into a 
subsequence of length at least V| which has the property that complete intervals are matched in 
the corresponding matching. Once we have this property we can contract each interval back into 
the original weighted symbol that generated it and obtain a subsequence of weight at lest |X[. As 
before, these modifications can be done by scanning the strings from left to right and repeatedly 
converting each matching of parts of intervals into a matching of complete intervals while removing 
conflicting matches. Each such modification adds w{£) £;-tuples to the matching and removes up 
to w(£) previously matched fc-tuples. The argument here is similar to the one in Lemma [2] and is 
based on the observation that all conflicting fc-tuples must come from the same interval in at least 
one of the k strings. □ 

6.1 ^-Orthogonal-Vectors 

We will prove SETH-based lower bounds for problems on k sequences via the orthogonal vectors 
problem on k lists (see Lemma [TJ] below). 

Definition 10 (k-Orthogonal-Vectors). Given k lists {aj}i£[ n ] (t £ [k]) of vectors a\ £ {0, l} d , 
are there k vectors aj i: af 2 , ...,a k k that satisfy, Ylh=i riteffc] a \ t W\ =0? Any collection of vectors 
( a i t )te[fc] w 'lth this property will be called orthogonal. 

Definition 11 (k-Most-Orthogonal-Vectors). Given k lists {af}j e [ n ] (t £ [k]) of vectors a\ £ {0, l} rf 

and an integer r £ {1,2,..., d}, are there k vectors a \ 1 , af 2 ,..., a k k that satisfy, Ylh =l Oie[fc] a k M — 
r ? The LHS of the latter expression will be called the inner product of the k vectors. A collection 
of vectors that satisfies the property will be called (r-) far, and otherwise it will be called (r-Jclose. 

Lemma 14. If k-Most- Orthogonal- Vectors on can be solved in T(n,k,d ) time, then given a CNF 
formula on n variables and M clauses, we can compute the maximum number of satisfiable clauses 
(MAX-CNF-SAT), in 0(T(2 n / k , k, M) -logM) time. 

Proof. The proof is generalization of the one for Lemma [TJ 

Given a CNF formula on n variables and M clauses, split the variables into k sets of size n/k 
and list all 2 n / fc partial assignments to each set. Define a vector v(a) for each partial assignment 


21 


a which contains a 0 at coordinate j £ [M] if a sets any of the literals of the j th clause of the 
formula to true, and 1 otherwise. In other words, it contains a 0 if the partial assignment satisfies 
the clause and 1 otherwise. Now, observe that if at (t £ [k]) is assignment for variables of t- th 
set (every set if of size n/k), then the inner product of vectors {v(at)}te[k] (as bi definition fill) is 
equal to the number of clauses that the assignment ( Q tek at ) does not satisfy. Therefore, to find 
the assignment that maximizes the number of satisfied clauses, it is enough to find k vectors at 
(t £ [A;]) such that the inner product of vectors {v(at)}te[k] is minimized. The latter can be easily 
reduced to O(logM) calls to an oracle for k-Most-Orthogonal-Vectors on k sets of N = 2 n i k vectors 
each in { 0 , 1 } M with a standard binary search. □ 


6.2 Adapting the reduction 

There are two challenges in adapting the hardness proof for problem of computing LCS between 
two sequences to the problem of computing LCS between k > 2 sequences: constructing the vector 
gadgets, and combining the gadgets in a way that implements a selection-gadget. We will start 
with the vector gadgets. 


Vector gadgets. We will need symbols a,b,c,d with w(a) = w(b) = w(c) = 1 and w(d) = 4 fc . 
For an integer p £ {0,1, 2,..., 2 k — 1} we define v p £ {0, l} fc to be a vector containing the binary 
expansion of p, i.e., (v p )t is t th bit in the binary expansion of p. for t £ [k]. Let function / satisfy 
/(0) = a and /(1) = b. For x £ {0,1}, x := 1 — x. 

For t- th set of vectors {ct4}ie[n] (t £ [k]) and i £ [n], and j £ [d] we define coordinate gadget 


CG t {a\,j) 


dcd Op 0 2 (, f((v P )t ) o d) if (al)j = 0 
dd Op=o 2 ( f(( v p)t ) ° d) otherwise. 


Claim 8 . Let E° = 2 + 2 k ■ w{d) and E!^ = — 1. For j £ [d] and i 2 , ..., ik € [n]. 


WLCSiCG^alj), CG 2 (a 2 i2 ,j),..., CG k (a k k ,j )) 


E n if ( a i t )j = 1 f° r al1 1 € [k\, 

Eg otherwise. 


Proof. The main idea behind the construction of the coordinate gadgets is as follows. Fix j £ [d] 
and consider a collection of k vectors. Consider the j th coordinate of all the vectors. Let ci, c 2 , ..., 
c k be such that q is equal to the j th coordinate of the t th vector. Suppose that for the t th sequence 
we set the coordinate gadget corresponding to c* to be equal to the following sequence. If ct = 0, 
we take binary expansion of the integers from 0 to 2 k — 1 and take t th bit from the expansion and 
concatenate all 2 k bits. If Q = 1, we do the same except we flip all the bits. Now consider the 
WLCS between all k sequences defined this way. For now, assume that we do not align symbols 
that have different indices, i.e., for two sequences a' and a", we are allowed to align a'[h'} and 
a"[h"\ iff h' = h". (We take care of this assumption below.) We can easily see that the WLCS is 
always equal to 2 between the sequences (independently of the values of ct). Now let us modify the 
coordinate gadgets as follows. Instead of concatenating the bits corresponding to the integers from 
0 to 2 k — 1, we concatenate the bits for the integers from 0 to 2 k — 2. We can check now that the 
WLCS is always equal to 2 except when all the c* bits are equal (i.e., ct = 0 for all t £ [k] or ct = 1 
for all t £ [k]). If all the bits are equal, then the WLCS is equal to 1. We want the construction of 
clause gadgets to satisfy the following property. If there exists t £ [k] with q = 0, then the WLCS 
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is equal to some fixed large value. While, if q = 1 for all t £ [k\, then the WLCS should be equal to 
some fixed small value. Our current construction almost satisfies this property. We want to modify 
the construction so that the value of the WLCS is equal to 2 when a = 0 for all t £ [k]. We can do 
that as follows. We take the previous construction and append a special symbol c at the beginning 
of the binary sequence if q = 0. We can check that the construction satisfies the needed property 
under the stated assumption. We proceed by showing that the actual definition of clause gadgets 
removes the necessity of the assumption. 

We want to match all the d symbols from every sequence, since if we don’t do that we end up 
with a WLCS cost that is less than Ef. We proceed by assuming that we match all the d symbols. 
We can now check that we have two matches if not all the vectors have a 1 at the j th coordinate, 
while we have one match otherwise. □ 

Let e be a symbol with w(e) = 100 • E%. 

For the t -th set of vectors {a4}ie[n] (t £ [A;]) and i £ [n] we define the vector gadget 

VGj(aJ) = e o Oj6[d](CG t (c4, j) o e ). 


Let E 0 = (d — r) ■ E% + r ■ E^ and E n = E 0 — 1. 
Claim 9. For ii, £ [n], 


WLCS(VG'M J, VG' k (a k k )) = 


> E n 


if “ii, 


a: 


t2> ' 


cf- k are r-far, 


< En otherwise. 


Proof. As in the proof of Claim [H we can conclude that in the optimal matching we use all the e 
symbols from all the sequences. If this is not so, then the maximum WLCS score is < E n . 

Using Claim [8] we can check that the WLCS cost is at least E 0 . if the vectors a^ , a| 2 ,..., off are 
r- far. Also, we can check that, if the vectors are r-close, then the WLCS cost is at most E n . □ 


Let / be a symbol with w(f) = E n . For a vector a we define 


VGi(a) = /oVGi(«), 


VG t (a) = VGj(a) o /, 

for t £ {2,3,..., k}. 

Claim 10. For i\, ...,ik £ [n], 


WLCSiVG^al), VG 2 (al),..., VG k {a k ik )) 


>E 0 if al i ,af 2 ,...,a k k are r -far, 
E n otherwise. 


Proof. If the vectors aj x ,af 2 , ■■■,a k k are r-far, we have a WLCS cost of at least E a as in Claim [9] 
and we do not use any of the / symbols. We cannot achieve a larger score than Eq by using the / 
symbols. 

If the vectors are r-close and we do not use any / symbols, the maximum cost is at most E n 
by Lemma [SJ If it is less than that, we can use the / symbols and achieve a score of E n . Notice 
that, if we use the / symbols, we cannot use any other symbol in any matching. □ 
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Combining the vector gadgets. A very simple padding strategy implies the lower bound for 
a variant of fc-LCS. 

Definition 12 (Local-/c-LCS). Given k strings of length n over an alphabet T, and an integer L, 
what is the length of longest sequence X such that there are k substrings of length L, one of each 
input string, such that X is a common subsequence of each one of these substrings. 

In words, we are looking for substrings of length L for which the LCS score is maximized. 

Theorem 7. If Local-k-LCS on strings of length n over an alphabet of size 0(1) can be solved in 
0(n k ~ e ) time, for some e > 0, then SETH is false. 

Theorem [7] follows from the following reduction. We note that in the constructed instances, 
L is always poly logarithmic in the lengths of the sequences, and therefore the problem can easily 
be solved in 0(n k ) time. This problem is closely related to the Normalized-LCS problem which 
was studied in [ AEP01 . EL041 and for which an n 2 ~°^ lower bound based on SETH was shown in 
|AVW 14] . 

Lemma 15. k-Most-Orthogonal Vectors on k lists of N vectors in {0, 1} M can be reduced to Local- 
k-LCS on k strings of length 2 k ■ N ■ M °^ over an alphabet of size 0(1). 

Proof. We construct k lists of vector gadgets from our k lists of vectors as in the above discussion. 
By the reduction of Lemma [13] from WLCS to LCS, we can convert each vector gadget VGt(a t ) to 
a longer string UVGt(a t ) such that what we proved for WLCS in Claim [TO] holds for LCS instead. 
Let L be the length of the longest vector gadget UVGt(a t ) that we create in this process. We also 
introduce two new symbols x,y. The first string will be defined as Pi = Q^L 1 (UVGi(aj) o x L ), 
while the other k — 1 strings will be Pt = QifL 1 (UVGt(al) o y L ), for t = 2 to k. To complete the 
reduction, we claim that if the input is a YES instance of A;-Most-Orthogonal Vectors, there will be 
k substrings of length L with LCS > E a , namely the k vector gadgets corresponding to the r-far 
vectors, while otherwise the maximum score of any k substrings is E n . The latter part is implied 
by Claim flOl and by noting that the x,y parts can never be matched, and they are long enough to 
prevent any substring of length L to contain symbols from more than one vector gadget. 

□ 

Next, we focus on the classic k- LCS problem and show how to implement the selection-gadget 
while making the existence of orthogonal vector influence the LCS in a manageable way. Unfortu¬ 
nately, we are not able to do this without introducing 0 {k) new symbols to the alphabet. 

Our lower bound for A;-LCS (Theorem [2]) follows from the following reduction. 

Lemma 16. For any k > 2, k-Most-Orthogonal Vectors on k lists of n vectors in {0, l} d can be 
reduced to k-LCS on k strings of length k°(k ) ■ n ■ d O(D 

over an alphabet of size 0 (k). 

Proof. We will show a reduction to L-WLCS and use Lemma fl3l to conclude the proof. 

We construct k lists of vector gadgets from our k lists of vectors as in the above discussion. 
Let D be the maximum possible sum of weights of all symbols in any vector gadget, and note 
that D = poly(2 k ,d) and that D > E 0 . For i € {2,... , A:} we will introduce a new symbol 3 i 
to the alphabet, and set B k = B = (10 kD) 2 and for 2 < i < k set w(3i) = B{ = 2k ■ P*+i. 
Finally, add two new symbols 0,2 and set w( 0) = w(2) = C = 10L 2 B 2 - The weights achieve 
C » B 2 » ■ ■ ■ » B k = B » D » E 0 . 
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Our k strings are defined as follows. For i G [k ], 


Pi = (3»+i • • • 3 k ) Q o (3 2 • • • 3*) o {VG'^nf 1)N o OtiVG'M) o {VG^ftf 1)N o (3 m ■ • • 3 k ) Q 

where VG\ (x) = 0 o VG\(x) o 2, VG[(x) = 0 o VGi(x) o 2 o (32 • • • 3,) if i > 2, and Q = |Pfc|. 

The intuition behind this padding is that we want to force any optimal matching to match 
all n vector gadgets of the first string to precisely n vector gadgets from each other string. This 
is achieved since: if at least one vector gadget from P % is not matched, we will lose some 0 or 2 
symbols that we could have matched, while if more than n vector gadgets are matched, we will 
lose at least one 3 i symbol. In addition, as long as we match consecutive n intervals from each 
string, we will get the same score from the padding, and therefore the optimal matching will be 
determined by the existence of an r- far set of vectors. The WLCS will be E if there are no r- far 
vectors, and E + 1 if there are, for an appropriately defined E. 

To make this argument more formal, we can follow the steps in the proof of Lemma [4] for LCS 
of two strings. First, we can prove an analog of Claim [H stating that matching n' intervals (vector 
gadgets) in some Pt for some n! > n can only contribute up to (n 1 — n)(B — 1) to the score. Then, 
we observe that by the padding construction, if n' > n then we will not be able to match at least 
in' — n ) of the 3^ symbols that we could have matched if n' was equal to n, which incurs a loss much 
greater than ( n' — n)B. Therefore, in an optimal matching, exactly n intervals will be matched in 
each sequence, and it is easy to see that the score is then determined by the existence of an r- far 
set of vectors. 

Let Ejj = 2C + E n and Eg = n ■ Ejj + Bo + (2 n + 1) • Yli =2 Pi- The following two lemmas prove 
that there is a gap in the WLCS of our k sequences when there is a collection of k vectors that are 
r-far as opposed to when there is none. 

Lemma 17. If there is a collection of k vectors that are far, then WLCS{P \,..., P k ) > Eq + 1. 

Proof. Let t \,..., t k be such that the k vectors (a\.)^ =1 are r-far. 

First, match the corresponding gadgets, (CGj(aJ.))^ =1 , along with the 0 and 2 symbols sur¬ 
rounding each of these gadgets, to get a weight of at least 2C + E 0 = 2 C + E n + 1 = Ejj + 1, by 
Claim [TOl 

Then, Match the i\ — l vector gadgets (and the surrounding 0, 2 symbols) to the left of VG\ (a^) 
to the i\ — 1 vector gadgets immediately to the left of VG^a].), for every i G {2and 
similarly, match the n — i\ gadgets to the right. The total additional weight we get is at least 
(n - 1) • E v . 

Then, note that after the above matches, only (n — 1) out of the (3n + 1) 32-symbols in P 2 
are surrounded by matched symbols. The remaining (2n + 2) 32-symbols can be matched, giving 
an additional weight of (2 n + 2) • B 2 , as follows: Consider the leftmost matched 0 in P 2 , call it x, 
and assume there are m 32-synrbols to the left of it in P 2 . Match these 32-symbols to the m such 
symbols in each other string Pi that appear immediately to the left of the symbol that is matched 
our x. By construction, and the fact that m can be at most n, we know that there are enough 
matchable 32 symbols in the other strings. 

Then, similarly, note that at this point, only 3 n out of the (5 n + 1) 33-symbols in P3 are 
surrounded by matched symbols. The remaining (2 n + 1) 33-symbols can be matched, as above, 
for an additional weight of (2 n + 1) • B^. And in general, we perform this process for i from 2 to 
k, and at i th stage, only (2 (i — 2 )n + n + 1 — 1) out of the (2 (i — l)n + n + 1) 3j-symbols in Pi are 
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surrounded by matched symbols, and we can match the remaining ones to get an additional weight 
of (2 n + 1) • B{. Thus, the total contribution of the 3, symbols is B 2 + + 1 )Bi. 

The total weight of our matching is at least Eu + 1 + (n — 1) • Ejj + B 2 + (2 n + 1) • Yli =2 P* = 

Eg + 1. □ 

The hard part is upper bounding the score when there is no collection of r -far vectors, and we 
will spend the rest of the proof towards this end. 

Lemma 18. If there is no collection of k vectors that are far, then WLCS(P \,..., Pk) < Eg- 

Proof. Consider any optimal matching of our k strings. The goal is to bound its score by Eg- Our 
plan will be to divide the contribution to the score into two: (a) the contribution of the vector 
gadgets, and (b) the contribution from the padding, i.e. the 3j symbols. In any matching, there is 
a tradeoff between the scores from (a) and (b): the more vector gadgets we align, the fewer 3j’s we 
can match, and vice versa. We will prove upper bounds for both contributions and show that they 
imply an upper bound of Eg on the total score. 

We start by formally defining (a) and upper bounding it. 

For each string P t , let s t and t, be the first 0 symbol and the last 2 symbol from Pi that are 
matched in our optimal matching, if they exist, respectively. A simple observation is that if some 
0 symbol is matched in the optimal matching (sj exists for all i £ [k ]), then there must exist some 
2 symbol that is also matched: otherwise, match the 2 immediately following that 0 and note that 
any conflicting matches must come from inside the vector gadgets and therefore removing all of 
them will decrease the score by much less than w(2). Thus, we can define N t to be the number 
of vector gadgets that lie between Si and ti, and if such Si,U do not exist, we set A 7 ,; = 0. By 
construction, Af < 2 (i — l)n + n, for all i £ [k]. Note that (si,..., sk) and (ii,... ,tk ) must be in 
our matching. 

We will assume that W > 1 for all i, since the only other case is that Vi £ [fc] : W = 0, which 
can easily be seen to be sub-optimal: in this case, only 3 i symbols are matched, and there cannot be 
more than (2(i — l)n + n + 1) matched 3* symbols for any i £ {2,... , k} which implies the following 
upper bound on the score: Yli= 2(^(2 — l) n + n + 1 )Bi < 3kn X^ =2 Pi < 2>knB2 < n ■ C < Eg- 

By construction, there are no 3 i symbols between si and t\, which implies that the matching in 
between (si,..., s/.) and (t\,..., tk) does not contain any 3* symbols. The total contribution of this 
part is what we call (a) above. On the other hand, the matching to the left of (si,..., Sk) and to the 
right of (ti,..., tk) cannot contain anything besides 3* symbols: If some symbol a £ {0, 32,..., 3*,} 
appears in Pi before s t and is matched, then the 0’s that appear right before the matched <r’s 
could have been matched together without any conflicts, which contradicts the optimality of the 
matching. An analogous argument shows that ti is to the right of any matched a ^ {2,32,..., 3*,}. 
Thus, the contribution of part (b) only comes from 3* symbols. 

This motivates the following definitions. From now on, we will refer to the sequences composed of 
the vector gadgets that are surrounded by 0, 2 as “intervals”, i.e. sequences of the form OoV Gi(x)o2. 
Consider the substrings between Sj and ti in each string Pi and remove any 3j symbols in them 
- since they are not matched anyway - and note that we obtain a concatenation of W intervals. 
Moreover, by our assumption that there is no satisfying assignment, we know that for any choice 
of one interval from each string, the fc-LCS is upper bounded by Ejj = 2C + E n , by Claim [TOj The 
main quantity we will be interested in is W(Li, ..., Lk) which is defined to be the maximum score 
of a matching of any k strings Ti,... ,Tk such that Ti is the concatenation of Lj intervals, and for 
any choice of one interval from each Tj, the optimal score is Ejj. By the symmetry of k- LCS, we 
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can assume WLOG that L\ < ■ ■ ■ < L&, and otherwise we reorder. To get the desired upper bound 
on W(L i,..., Lfc) it will be convenient to first upper bound Wo(L\,..., L &), which is defined in a 
similar way, except that we require the matching to match all 0 and 2 symbols from Tf, i.e. the 
string string with fewest intervals. 

Define Eb = 2 C + D which is an upper bound on the maximum possible total weight of all the 
symbols in an interval. A key inequality, which we will use multiple times in the proof, following 
from the fact that the 0/2 symbols are much more important than the rest, is the following. 

Fact 1. Our parameters satisfy Eb < Ejj + (B — 1 )/(k — 1). 

Proof. Follows since (k — 1 ){Eb — Ejj) < (k — 1 )D < B, by our choice of parameters. □ 

Claim 11. For any integers 1 < L\ < ... < Lk, we can upper bound ... , L *,) < L\ ■ Ejj + 

(Lk-Lj-iB- 1). 

Proof. Let T,..., 7). be any k sequences with Li,..., Lj. intervals, respectively, that satisfy the 
assumption in the definition of Wq . Consider an optimal matching of the k sequences in which all 
the 0 and 2 symbols of T\ are matched and we will upper bound its weight Ep by L\ • Ejj + (L*. — 
L\) ■ (B — 1), which will prove the claim. Note that in such a matching, for any i £ {2,..., k}, each 
interval of T\ must be matched completely within one or more intervals of T t . and each interval of 
Tj has matches to at most one interval from T (otherwise, it must be the case that some 0 or 2 
symbol in T\ is not matched). 

We upper bound the weight of the matching by considering two kinds of intervals in T\ and 
upper bounding their contributions. Let x be the number of intervals of T\ that contribute at most 
Ejj to the weight of our optimal matching, and call the other ( L\ — x) intervals “full”. Note that 
any full interval must be matched to a substring of Tj, for some i £ {2,..., k}, that contains at 
least two intervals for the following reason. The 0 and 2 symbols of the interval of T± must be 
matched, and, if the matching stays within a single interval of Tj, for all i € {2,... ,k}, and has 
more than Ejj weight, then we have a contradiction to the assumption that no k intervals, one 
from each string, can have a k -LCS score greater than Ejj. Thus, we have x intervals consuming at 
least 1 interval from every Tj, and we have (Li — x) full intervals consuming at least 1 interval from 
every T* and at least 2 intervals from some Tj. Using the fact that the total number of intervals in 
T 2 ,..., T/ is L 2 + • • • + Tfc < (k — 1 )Lj c , we get the condition, 

(k — 1) • x + k ■ (Li — x) < (k — 1 )Lfc. 

We can now give an upper bound on the weight of our matching, by summing the contributions of 
each interval of Tp There are x intervals contributing < Ejj weight, and there are ( L\ —x) intervals 
with unbounded contribution, but we know that even if all the symbols of an interval are matched, 
it can contribute at most Eb- Therefore, the total weight of the matching can be upper bounded 
by 

Ef < (Ti — x) ■ Eb + x • Ejj 

We claim that no matter what x is, as long as the above condition holds, this expression is less 
than L\ ■ Ejj + ( L k - L\) • (B - 1). 

To maximize this expression, we choose the smallest possible x that satisfies the above condition, 
since Eb > Ejj , which implies that x = max{0, kL\ — {k — 1 )L/ i }. 
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First, consider the case where L k > L\ ■ jEj . and therefore x = 0, which means that all the 
intervals of T\ might be fully matched. Using Fact Q] and that L k — L\ > Li/(k — 1), we get the 
desired upper bound: 

Ef < L\ ■ Eb < L\ ■ (Eu + (B — 1 )/(k — 1)) < L\ ■ Ep + (L k — L\) • (B — 1). 

Now, assume that L k < L\ ■ jEj, and therefore x = kL\ — (k — 1 )L k . In this case, when setting 
x as small as possible, the upper bound becomes: 

Ef < ({k - 1 )Lk — (k — l)Xi) • Eb + (kLi — (k — 1 )L k ) ■ Eu = L\ ■ Eu + (k — 1 )(L k — L\) ■ (Eb — Eu), 
which, by Fact [U is less than L\ ■ Eu + ( L k — L\) ■ (B — 1). □ 

We are now ready to upper bound the more general W(L \,..., L k ). 

Claim 12. For any integers 1 < L\ < ... < L/ ; , we can upper bound W(L \,..., L &) < L\ ■ Eu + 
(Lk-L^-iB- 1). 

Proof. We will prove by induction on £ > k that: for all 1 < L\ < ... < such that L\-\ -|--Ffc < 

L W(Li ,..., L k ) < L x • Eu + (L k - L x ) • (B - 1). 

The base case is when l = k and L\ = ■ ■ ■ = L k = 1. Then W(l,..., 1) = Eu, by the assumption 
on the strings in the definition of W , and we are done. 

For the inductive step, assume that the statement is true for all £' < l — 1 and we will prove 
it for i. Let L\,... ,L k be so that 1 < Li < • • • < L k and L\ + • ■ • + L k = l and let T\,... ,T k be 
sequences with a corresponding number of intervals. Consider the optimal (unrestricted) matching 
of Ti,..., Tfc, denote its weight by Ef- Our goal is to show that Ep < L\ ■ Eu + (L k — L i) • (B — 1). 

If every 0/2 symbol in T\ is matched, then, by definition, the weight cannot be more than 
W 0 (Ex ...., L k ), and by Claim [Tl] we are done. Otherwise, consider the first unmatched 0/2 symbol 
in Ti, call it x, and there are two cases. 

The x = 0 case: If x is the first 0 in Ti, then for some i £ {2,... , k}, the first 0 in Ti must 
be matched to some 0 after x (otherwise we can a 0 to the matching without violating any other 
matches) which implies that none of the symbols in the interval starting at x can be matched, since 
such matches would be in conflict with the match that contains this first 0. Otherwise, consider the 
2 that appears right before x, call it y , and note that it must be matched, to some 2-symbols yx in 
Ti for every i £ {2,..., k}, by our choice of x as the first unmatched 0/2 symbol in Ti. Now, there 
are two possibilities: either for some i £ {2,... , k}, our yx is the very last 2 in Tj, and there are 
no more intervals in X) after this match, or for some i £ {2,..., k}, the 0 right after yi is already 
matched to some 0 in Ti that is after x (from a later interval in Ti). Note that in either case, the 
interval starting at x (and ending at the 2 after it) is completely unmatched in our matching. 

Let T( be the sequence with (Xi — 1) intervals which is obtained from Ti by removing the interval 
starting at x. The weight of our matching will not change if we look at it as a matching between 
T- 2 ,. ■ ■, T k and T( instead of T\, which implies that Ep < W(L — li, L 2 ,..., L k ). Using our inductive 
hypothesis we conclude that Ep < (L\ — l)-Eu + (L k — L\ + T)-(B — 1) < L\-Ep + (L k — L\)-(B — 1), 
since Ep > B, and we are done. 

The x = 2 case: The 0 at the start of x's interval must have been matched to some 0-symbols 
Xi from each string Ti. For each i £ {2,..., k}, let Zi be the 2 at the end of xfs interval. Note that 
for at least one i £ {2,..., k}, Zi must be matched to some w = 2 in T\ after x, since otherwise, 
we can add (x,Z 2 , ■ ■ ■ ,z k ) to the matching, gaining a cost of C, and the only possible conflicts we 
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would create will be with matches containing symbols inside the x t —> Zi interval (that are not 0 or 
2), for some i G {2,... , A:}, or inside x’s interval, and if we remove all such matches, we would lose 
weight of at most (Ep — 2 C) which is much smaller than the gain of C from the new 2 we matched 
- implying that our matching could not have been optimal. Let j G {2,... , A’} be the index of this 
string, so that in Tj , both Xj and Zj are matched. Therefore, there are c > 2 intervals in T\ that are 
matched to a single interval in Tj: all the intervals starting at the 0 right before x and ending at w 
are matched to the Xj —> Zj interval. Let Tj be the sequence obtained from T\ by removing all these 
c intervals and let Tj be the sequence obtained from Tj by removing the Xj —>• zj interval. Similarly, 
define T'■ for every i G [k] — {l,j} to be the sequence obtained from T* by removing all the c* > 1 
intervals starting at Xi and ending at the 2 that is matched with Zj. Our matching can be split 
into two parts: a matching of Tj,..., Tj, and the matching of the Xj —>• Zj interval to the removed 
intervals. The contribution of the latter part to the weight of the matching can be at most the weight 
of all the symbols in an interval, which is Eb- Consider the new sequences Tj,..., Tj and note 
that: for each i, Ti contains no more than Tj — 1 intervals while the sequence with fewest intervals 
has no more than L\ — c which is the number of intervals in Tj. Thus, by definition, we know that 
any matching of Tj,..., Tj can have weight at most W{L\ — c,..., — 1), and by the inductive 

hypothesis, we can upper bound W{L\ — c ,..., — 1) < (Ti — c) • Ejj + (Lk — 1 — L\ + c) ■ (5 — 1). 

Summing up the two bounds on the contributions, we get that the total weight of the matching is 
at most: 


Ep < T(b+(Ti— c)-T[/+(T^—Li+c— 1)-(B— 1) < L\-Ep-\-{Lk— l)+(c— l) - (5—1)+Tg— c-Ep 

However, note that Eb < 1.1 Ejj and that (c — 1.1 )Ep > 10(c — 1.1)5 > (c — 1)5, which implies 
that Ep can be upper bounded by L\ ■ Ep + (L*. — L\) ■ (5 — 1), and we are done. □ 

We now turn to bounding (b). Recall the definition of N t above, as the number of intervals 
from Pi that are matched. Let us also define Xi- as the number of 3* symbols from Pi that appear 
before s t and are matched in our optimal matching, and define to be the number of such 
3 i symbols that appear after A;. Then, the contribution of (b) to the score can be bounded by 
+ Xi + )Bi- A simple but key observation is the following. 

Claim 13. For every i G (2,..., k}, 


i —1 

Xi- + x i+ < 2 (i - 1 )n + n + 2 — Xj _ + x j+ — 1) — AV* 

3 =2 

Proof. Focus on Pi and note that there are only (2 (i — 1 )n + n + 1) 3j-syrnbols in it. To make the 
counting easier, let us define a set U that is initially empty, and we will add unmatchable 3j symbols, 
from Pi, to U. In the end, we will argue that |5j + + Xi + must be at most (2 (i — 1 )n + n + 1). 

First, we add the (N t — 1) 3j symbols that lie between s* and L to U, since those are clearly 
unmatchable. 

Second, we will focus on the prefix of 5* that ends at Sj, call it Qi- For 2 < j < i, note that 
there must be xj- 3j-symbols in Qi that are matched and let qj be the first such 3 j symbol. Since 
qj is matched to the first 3 j symbol in Pj that is matched, and that in Pj there are no 3 h symbols, 
for any h > j between that 3 j symbol and Sj, we can conclude that: for any j < h < i, all the Xh- 
3/,-symbols in Qi that are matched are in the subsequence of Qi starting at qh and ending at qj. 
In fact, this implies that all the Xh- 3/,-symbols in Q, that are matched are in the subsequence of 
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Qi starting at qh and ending right before qn-i- Thus, for each 2 < h < i, we can add x'/,_ new 3j 
symbols to our unmatchable U - the ones in the latter subsequence. 

Finally, we focus on the suffix of Pi that starts at f*, and using a similar reasoning we conclude 
that for each 2 < h < i, we can add (xh+ — 1) new 3* symbols to our unmatchable U. 

Thus, we conclude that (Aj — 1) + Xq= 2 ( x i- + x j+ ~ 1) + x i- + x i+ < (2(i — 1 )n + n + 1), which 
proves the claim. 

□ 

For any fixed values for N\,, Aq. satisfying 1 < Aj < 2{i — l)n + n, we can compute the 
largest possible contribution of part (b). Since if i < j then B{ is much larger than Bj , the optimal 
score is achieved when setting (aq_ + Xj + ) to be as large as possible, regardless of the 3 j symbols we 
make unmatchable for j > i. That is, we claim that the optimal score is achieved when each of the 
inequalities in Claim fl3l are saturated, i.e. +aq + = 2(i — l)n + n + 2 — X^= 2 ( a b'- ~* rX i+ ~~ 1) — A*. 
This is true, since if any inequality is not saturated, say for i, then we can always add at least 
one 3 i symbol to the matching (gaining Bi weight) and remove at most one 3 j symbol for each 
j € {i +1,..., k} (losing less than (k — l)Bi + \ < Bi weight) and obtain a valid matching with larger 
cost, contradicting the optimality of our matching. Therefore, the number of matched 3 i symbols 
is precisely, 

2—1 

Xi- + x i+ = 2(i - 1 )n + n + 2 - ^(aq_ + Xj + - 1) - Aj. 

3 =2 

We can now formally analyze the tradeoff between (a) and (b), and prove that the optimal 
matching matches exactly n intervals from each sequence. 

Claim 14. In the optimal matching, N± = • • • = N,). = n. 

Proof. Assume for contradiction that the claim does not hold, and we are in one of the two cases. 

Case 1: For some i £ [A;], Aj > n. In this case, we consider any matching in which N' t = n 
intervals are matched in Pi , and in which the , Xi+ values are chosen optimally for all i 6 
{2,... , k}. Let N m = miny =1 Aj. Clearly, the number of 3 m symbols in the new matching is at 
least (x m - +x m+ + (A m — ra)), i.e. increased by ( N m — n ). Thus, in the contribution of part (b), we 
have gained a weight of at least (A m — n)B m . To bound the loss in part (a), let A m j n = min^ =1 Nj 
and note that N m < n. The new contribution of part (a) is at least n ■ Ejj, while in the original 
matching, the contribution was at most A m j n • Eu + (A m — A m j n ) • (B — 1). Since Eu > B, the 
latter expression is maximized when A m j n is as large as possible, i.e. N rmn = n, and we can 
upper bound it by n ■ Ejj + (N m — n) ■ (B — 1). In total, the loss in part (a) is no more than 
n ■ Eu — n ■ Eu + (A m — n) ■ {B — 1) which is much less than (N m — n)B m , which is a contradiction 
to the optimality of our matching. 

Case 2: For all i G [A;], Aj < n, but for some i € [k], Ni < n. In this case, we consider any 
matching in which N[ = n intervals are matched in P t , and in which the aq_,aq+ values are chosen 
optimally for all i € {2,..., k}. Clearly, for each i £ {2,..., k} the number of 3 i symbols in the new 
matching is at least (x',;_ + Xi + — i[n — Aj)), i.e. decreased by no more than i(n — Aj). Thus, in the 
contribution of part (b), we have lost a weight of at most Yli =2 *( n ~ Aj)5j < kB 2 Si= 2 ( n — Aj), 
but we have gained a larger weight, in part (a), as we show below. 

Let N m = min^ =1 Nj and note that maxj =1 Aj < n. By Claim fl2l the part (a) contribution for 
the original matching had weight at most N m ■ Eu + (n—N m ) ■ (B — 1), where N m < Ni. On the other 
hand, in the new matching, at least n intervals are matched from each string, and therefore the 
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contribution is at least n-Eij. Thus, in part (a) we gain at least n -Ejj — N m -Eu — (n — N m )■ (B — 1) = 
(n — N m ){Ejj — B + 1), which is larger than kB 2 Yf!i- 2 (n — Nf) — kB 2 (k — l)(n — N m ) since 
E V >C> k 2 B 2 . 

□ 

Finally, after we proved that N± = ■ ■ ■ = Nk = n, we know the exact contribution of both 
parts: For part (b), by Claim [13] and the optimality conditions on the Xi-,Xi + values, we get that 
x-2~ + x 2+ = 27i + 2 and for i £ {2, ... , k} we have + Xi + = 2n + 1, and the total contribution 
is exactly B 2 + (2n + 1) • Yli =2 F° r P ar t (a), by Claim [121 the total contribution is n • Ejj. 
Combined, the total score of our optimal matching is exactly n ■ Efj + B 2 + (2 n + 1) • 2 B{ = Eq. 

^ □ 

Note that the length of the sequences is 0(n-d° W) while the largest weight used is 0{k°^d 0<yl ^) 
and thus Lemma [T3l implies the claimed bound. 

□ 
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